Cacheless Instruction Fetch Mechanism for Multithreaded Processors
نویسنده
چکیده
The speed difference between processors and memories has become to one of the biggest problem in designing memory systems. While this primarily limits fast sequential access to data in memory it also sets constraints to efficient instruction fetch. In computers using single threaded processors this latter problem has traditionally been partially solved by using instruction caches, but in fast multithreaded processors supporting a large number of threads the problem is more difficult, because each thread can execute the program from an unique address (MIMD-style) or all threads can just access the same location synchronously (SIMD-style). In this paper we propose two cacheless instruction fetch mechanisms for multithreaded processors composed of an interthread pipelined instruction fetch unit and a banked instruction memory module using randomized hashing, combining and partitioning. The proposed mechanisms along with a two reference mechanisms based on direct mapped and T-way set associative caching are evaluated in a T-threaded case by simulations. According to our evaluation the proposed mechanisms solve efficiently the speed difference problem and provide clearly better performance than the reference solutions. Key-Words: Multithreaded processors, instruction memory, instruction fetching mechanisms, interleaving realistic latency memory chips for fast multithreaded processors by applying interthread pipelinining to the instruction fetch unit and randomized hashing and combining to the instruction memory module like done for data memories in [7]. The obtained mechanism should be capable of delivering operation codes at a sustained rate of close to one instruction per clock cycle. The rest of the paper is organized as follows: In section 2 multithreaded processors and our test architecture are introduced. In section 3 we propose two methods for fetching instructions efficiently in a Tthreaded processor despite of the high speed difference as well as describe two reference solutions based direct mapped and T-way set associative caching. Evaluation of the proposed method along with the reference solutions is given in section 4. Finally, in section 5 we give our conclusions. 2. Multithreaded processors In order to balance computation and communication, new type of processors are needed. This is because conventional single threaded processors spend most of their time in waiting for memory references to complete. Allowing for multiple ongoing memory references simultaneously would help a little, but it would not solve the problem due to dependencies limiting parallel execution of memory references and resource constraints of a single thread. Multithreaded processors are processors which have hardware support for efficient execution of multiple threads e.g. in a form of fast thread switches and buffers for threads [10]. If a T-threaded processor is connected to an efficient enough pipelined memory system featuring latency less than T computation and communication will be automatically balanced, because the processor can execute other threads while a thread references to the memory.
منابع مشابه
Control and Data Dependence in Multithreaded Processors
Boosting instruction level parallelism in dynamically scheduled processors requires a large instruction window. The approach taken by current superscalar processors to build the instruction window is known to have important limitations, such as the requirement of more powerful instruction fetch mechanisms and the increasing complexity and delay of the issue logic. In this paper we present a nov...
متن کاملDependence Speculative Multithreaded Architecture
Boosting instruction level parallelism in dynamically scheduled processors requires a large instruction window. The approach taken by current superscalar processors to build the instruction window is known to have important limitations, such as the requirement of more powerful instruction fetch mechanisms and the increasing complexity and delay of the issue logic. In this paper we present a nov...
متن کاملEmulating Unimplemented Instructions in a Simultaneous Multithreaded Processor
Emulating unimplemented instructions can reduce the cost and power requirements of a processor by allowing functional units to be removed. But the handling of unimplemented instruction exceptions in modern processors wastes fetch bandwidth and reduces throughput due to squashed instructions. Simultaneous Multithreaded (SMT) processors can avoid the waste by using multiple thread contexts to han...
متن کاملAn Effective Bypass Mechanism to Enhance Branch Predictor for SMT Processors
Unlike traditional superscalar processors, Simultaneous Multithreaded processor can explore both instruction level parallelism and thread level parallelism at the same time. With a same fetch width, SMT fetches instructions from a single thread not so deeply as in traditional superscalar processor. Meanwhile, all the instructions from different threads share the same Function Unites in SMT. All...
متن کاملThe Use of Multithreading for Exception Handling Craig
Figure 1. Traditional vs. Multithreaded Exception Handling. Six instructions have been fetched when an exception is detected on the fourth. Traditionally (a), instructions 4-6 are squashed and must be refetched after the exception handler is fetched. With our multithreaded mechanism (b), a second thread fetches the exception handler (AD), and then the main thread continues to fetch (7,8). The e...
متن کامل